On the choice of the number of blocks with the incremental EM algorithm for the fitting of normal mixtures
نویسندگان
چکیده
The EM algorithm is a popular method for parameter estimation in situations where the data can be viewed as being incomplete. As each E-step visits each data point on a given iteration, the EM algorithm requires considerable computation time in its application to large data sets. Two versions, the incremental EM (IEM) algorithm and a sparse version of the EM algorithm, were proposed recently by Neal R.M. and Hinton G.E. in Jordan M.I. (Ed.), Learning in Graphical Models, Kluwer, Dordrecht, 1998, pp. 355–368 to reduce the computational cost of applying the EM algorithm. With the IEM algorithm, the available n observations are divided into B (B ≤ n) blocks and the E-step is implemented for only a block of observations at a time before the next M-step is performed. With the sparse version of the EM algorithm for the fitting of mixture models, only those posterior probabilities of component membership of the mixture that are above a specified threshold are updated; the remaining component-posterior probabilities are held fixed. In this paper, simulations are performed to assess the relative performances of the IEM algorithm with various number of blocks and the standard EM algorithm. In particular, we propose a simple rule for choosing the number of blocks with the IEM algorithm. For the IEM algorithm in the extreme case of one observation per block, we provide efficient updating formulas, which avoid the direct calculation of the inverses and determinants of the component-covariance matrices. Moreover, a sparse version of the IEM algorithm (SPIEM) is formulated by combining the sparse E-step of the EM algorithm and the partial E-step of the IEM algorithm. This SPIEM algorithm can further reduce the computation time of the IEM algorithm.
منابع مشابه
Distributed and Cooperative Compressive Sensing Recovery Algorithm for Wireless Sensor Networks with Bi-directional Incremental Topology
Recently, the problem of compressive sensing (CS) has attracted lots of attention in the area of signal processing. So, much of the research in this field is being carried out in this issue. One of the applications where CS could be used is wireless sensor networks (WSNs). The structure of WSNs consists of many low power wireless sensors. This requires that any improved algorithm for this appli...
متن کاملMixture of Xylose and Glucose Affects Xylitol Production by Pichia guilliermondii: Model Prediction Using Artificial Neural Network
Production of several yeast products occur in presence of mixtures of monosaccharides. To study effect of xylose and glucose mixtures with system aeration and nitrogen source as the other two operative variables on xylitol production by Pichia guilliermondii, the present work was defined. Artificial Neural Network (ANN) strategy was used to athematically show interplay between these three c...
متن کاملSolid Phase Equation of State Application for Wax Formation Prediction in Petroleum Mixtures
Precipitation ofsolid paraffins is one ofthe most common problems in the oil industry, imposing high operating costs. There have been a great many efforts for the prediction <span style="font-size: 10p...
متن کاملAn Incremental DC Algorithm for the Minimum Sum-of-Squares Clustering
Here, an algorithm is presented for solving the minimum sum-of-squares clustering problems using their difference of convex representations. The proposed algorithm is based on an incremental approach and applies the well known DC algorithm at each iteration. The proposed algorithm is tested and compared with other clustering algorithms using large real world data sets.
متن کاملDetermination of the number of components in finite mixture distribution with Skew-t-Normal components
Abstract One of the main goal in the mixture distributions is to determine the number of components. There are different methods for determination the number of components, for example, Greedy-EM algorithm which is based on adding a new component to the model until satisfied the best number of components. The second method is based on maximum entropy and finally the third method is based on non...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Statistics and Computing
دوره 13 شماره
صفحات -
تاریخ انتشار 2003